Unsupervised Domain Adaptation for Word Sense Disambiguation using Stacked Denoising Autoencoder

نویسندگان

Kazuhei Kouno

Hiroyuki Shinnou

Minoru Sasaki

Kanako Komiya

چکیده

In this paper, we propose an unsupervised domain adaptation for Word Sense Disambiguation (WSD) using Stacked Denoising Autoencoder (SdA). SdA is an unsupervised learning method of obtaining the abstract feature set of input data using Neural Network. The abstract feature set absorbs the difference of domains, and thus SdA can solve a problem of domain adaptation. However, SdA does not always cope with any problems of domain adaptation. Especially, difficulty of domain adaptation for WSD depends on the combination of a source domain, a target domain and a target word. As a result, any method of domain adaptation for WSD has adverse effect for a part of the problem, Therefore, we defined the similarity between two domains, and judge whether we use SdA or not through this similarity. This approach avoids an adverse effect of SdA. In the experiments, we have used three domains from the Balanced Corpus of Contemporary Written Japanese and 16 target words. In comparison with baseline, our method has got higher average accuracies for all combinations of two domains. Furthermore, we have obtained better results against conventional domain adaptation methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Nonlinear Feature Coding for Unsupervised Domain Adaptation

Deep feature learning has recently emerged with demonstrated effectiveness in domain adaptation. In this paper, we propose a Deep Nonlinear Feature Coding framework (DNFC) for unsupervised domain adaptation. DNFC builds on the marginalized stacked denoising autoencoder (mSDA) to extract rich deep features. We introduce two new elements to mSDA: domain divergence minimization by Maximum Mean Dis...

متن کامل

Use of Combined Topic Models in Unsupervised Domain Adaptation for Word Sense Disambiguation

Topic models can be used in an unsupervised domain adaptation for Word Sense Disambiguation (WSD). In the domain adaptation task, three types of topic models are available: (1) a topic model constructed from the source domain corpus: (2) a topic model constructed from the target domain corpus, and (3) a topic model constructed from both domains. Basically, three topic features made from each to...

متن کامل

Marginalized Stacked Denoising Autoencoders

Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep learning architectures where the weights are fine-tuned with back-propagation. Alternatively, the outputs...

متن کامل

Learning Entity Representation for Entity Disambiguation

We propose a novel entity disambiguation model, based on Deep Neural Network (DNN). Instead of utilizing simple similarity measures and their disjoint combinations, our method directly optimizes document and entity representations for a given similarity measure. Stacked Denoising Auto-encoders are first employed to learn an initial document representation in an unsupervised pre-training stage. ...

متن کامل

HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation

This paper presents an unsupervised system for all-word domain specific word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5% on SemEv...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Unsupervised Domain Adaptation for Word Sense Disambiguation using Stacked Denoising Autoencoder

نویسندگان

چکیده

منابع مشابه

Deep Nonlinear Feature Coding for Unsupervised Domain Adaptation

Use of Combined Topic Models in Unsupervised Domain Adaptation for Word Sense Disambiguation

Marginalized Stacked Denoising Autoencoders

Learning Entity Representation for Entity Disambiguation

HIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation

عنوان ژورنال:

اشتراک گذاری